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Abstract  of  Thesis  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Master  of  Science 

An  Algorithm  to  Translate  Relational 
Algebra  Queries  into  QUEL 

by 

Dennis  F.  Blumenthal 
May  1985 

Chairman:  Dr.  Stanley  Y.  W.  Su 

Major  Department:  Computer  and  Information  Sciences 

An- algorithm  for  translating  relational  algebra  query 
trees  into  QUEL  is  pre-sented.  The  algorithm  iS?used  to 
process  queries  in  a  distributed,  heterogeneous  data  base 
management  system  used  to  support  computer  integrated 
manufacturing.  A  node-by-node  translation  algorithm  as  well 
as  an  optimal  algorithm  which  uses  minimization  techniques 
is  presented.  A  general  discussion  about  implementation  is 
also  included. 


Chairman 


CHAPTER  I 
INTRODUCTION 


Computer  integrated  manufacturing  (CIM)  is  an  important 
goal  of  current  manufacturing  research  underway  at  the 
National  Bureau  of  Standards  (NBS) .  A  general  discussion  of 
that  work  is  found  in  [Sim82],  The  objective  of  CIM  is  to 
automate  all  phases  of  manufacturing,  ranging  from  engi¬ 
neering  design  and  analysis,  to  parts  machining  and 
materials  handling,  and  to  integrate  them  into  a  system 
which  also  supports  automated  management  functions,  such  as, 
process  planning,  shop  scheduling,  and  inventory  control. 
Achieving  this  goal  will  undoubtedly  facilitate  factory 
management,  permit  more  even  utilization  of  resources,  and 
result  in  increased  productivity  and  reduced  operating 
costs.  Comprehensive  introductory  material  to  batch 
manufacturing  is  found  in  selected  articles  published  in 
[  IEE83 ] . 

To  accomplish  CIM,  it  is  essential  to  integrate  and 
manage  the  large  body  of  data  which  comprises  the  manufac¬ 
turing  database.  This  database  consists  of  many  types  of 
data  which  cannot  be  easily  included  in  a  single  database 
management  system  (DBMS) .  Current  automated  manufacturing 
systems  utilize  autonomous  data  management  resources  of 
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varying  complexity,  ranging  from  simple  file  management 
systems  to  sophisticated  database  management  systems.  These 
systems  usually  differ  from  each  other,  not  only  in  their 
internal  data  representation,  but  more  importantly  in  their 
underlying  data  models,  data  definition  languages  (DDL)  and 
data  manipulation  languages  (DML) .  Discussions  about  DBMS 
architectures  for  manufacturing  systems  are  found  in  [Bee83] 
and  [Mac83].  A  discussion  on  the  use  of  homogeneous  and 
heterogeneous,  distributed  data  management  systems  is  found 
in  [Bee83].  In  [NBS85]  a  heterogeneous,  distributed  DBMS, 
called  Integrated  Manufacturing  Data  Administration  System 
(IMDAS) ,  is  proposed  for  the  Automated  Manufacturing 
Research  Facility  ( AMRF) .  A  description  of  the  AMRF  is 
found  in  [Sim82],  [Mac82] ,  and  [Mac83] . 

In  order  to  share  a  heterogeneous  database  through  a 
network,  it  must  be  represented  through  a  global  schema  that 
incorporates  logical  units  which  can  differ  in  local  struc¬ 
ture  and  content.  The  IMDAS  architecture  of  [NBS85] 
introduces  a  three-view  concept  of  distributed  data  and  is 
shown  in  Figure  1.  A  global  external  view  is  a  given 
component  system's  view  of  the  distributed  database.  It 
contains  entities  and  associations  of  interest  to  the 
component  system.  The  global  conceptual  view  is  the  inte¬ 
gration  of  the  global  external  views  and  is  the  view  of  all 
the  data  which  comprises  the  manufacturing  database. 
Fragmented  views  certain  occurrences  of  the  global  concep¬ 
tual  entities  and  associations  which  are  partitioned  or 
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replicated  across  the  component  systems.  Ultimately,  any 
fragmented  view  is  mapped  into  the  particular  internal 
representation  of  the  local  subsystems  which  support  it. 


Figure  1:  Distributed  database  architecture. 

When  a  component  or  user  needs  to  access  data  in  the 
manufacturing  database,  it  issues  a  network  query,  in  a 
common  language,  against  one  of  the  global  external  views, 
which  then  undergoes  translation  so  it  can  be  processed  by 
one  or  more  local  data  management  subsystems.  We  note  here, 
that  a  distributed  system  without  a  common  query  language 
and  n  local  subsystems,  requires  a  total  of  (n-l)n  trans¬ 
lators,  while  an  equivalent  system  with  a  common  query 
language  only  requires  a  total  of  2n  translators. 

The  work  presented  in  this/paper  deals  with  the 
development  of  an  algorithm  required  to  translate  queries 

'  1  •'  ’  "  '  *  '•  '  •  •  '  '  s 

from  IMDAS intermediate  representation,  relational  algebra, 
into  QUEL,  the  query  language  of  the  INGRES  subsystem. 
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RANGE  OF  r  IS  PART 

RETRIEVE  INTO  TEMP  ( r . ALL )  WHERE  r. weight  <  15 
Figure  19:  Selection  example  in  QUEL. 

Union 

Given  two  relations,  R  and  S,  which  are  union  compati¬ 
ble,  it  is  possible  to  derive  a  relation,  T,  such  that 
R  and  S  are  partitions  of  T.  The  union  operation  can  be 
accomplished  in  QUEL  as  shown  in  Figure  20. 


RANGE  OF  r  IS  R 
RANGE  OF  s  IS  S 
RETRIEVE  INTO  X  (r.ALL) 
APPEND  TO  X  ( s .ALL) 
RANGE  OF  t  IS  X 
RETRIEVE  INTO  T  ( t . ALL) 

Figure  20:  Union  in  QUEL. 


The  last  two  lines  in  Figure  20  are  used  to  eliminate 
duplicate  tuples  which  may  be  present  after  the  execution  of 
the  APPEND  statement.  Duplicate  tuples  are  only  allowed  in 
INGRES  when  relations  are  stored  as  heaps,  as  is  the  case 
with  temporary  relations. 

The  QUEL  sequence  for  PARTI  UNION  PART2  is  shown  in 
Figure  21. 


RANGE  OF  r  IS  FART1 
RANGE  OF  s  IS  PART 2 
RETRIEVE  INTO  TEMPI  (r.ALL) 
APPEND  TO  TEMPI  ( s . ALL) 
RANGE  OF  t  IS  TEMPI 
RETRIEVE  INTO  TEMP 2  ( t . ALL ) 

Figure  21:  Union  example  in  QUEL. 
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RANGE  OF  r  IS  R 

RETRIEVE  INTO  S  (r.Al,  r.A2,  .  .  .  ,  r.Ak) 

Figure  16:  Projection  in  QUEL. 

The  QUEL  sequence  for  the  example,  PART [color,  partno] , 
presented  in  Chapter  II,  is  shown  in  Figure  17. 

RANGE  OF  r  IS  PART 

RETRIEVE  INTO  TEMP  (r. color,  r. partno) 

Figure  17:  Projection  example  in  QUEL. 

Selection 

Given  a  relation,  P.,  it  is  possible  to  derive  a  rela¬ 
tion,  S,  such  that  its  tuples  are  a  horizontal  subset  of  R, 
for  which  the  selection  criterion,  Q,  is  true.  Q  is  a 
logical  predicate  which  compares  at  least  one  attribute  of  R 
with  another  attribute  of  R  or  with  a  constant.  The  selec¬ 
tion  operation  can  be  accomplished  in  QUEL  with  the  sequence 
shown  in  Figure  18. 


RANGE  OF  r  IS  R 

RETRIEVE  INTO  S  ( r . ALL )  WHERE  Q 
Figure  18:  Selection  in  QUEL. 


The  QUEL  sequence  for  PART  WHERE  PART. weight  <  15  is 


shown  in  Figure  19. 


CHAPTER  III 

ALGEBRAIC  OPERATIONS  WITH  QUEL 


This  chapter  presents  the  QUEL  sequences  necessary  to 
accomplish  all  the  algebraic  operations  presented  in  Chap¬ 
ter  II.  QUEL,  the  query  language  of  INGRES,  is  based  on  the 
relational  calculus  and  is  relationally  complete  as  proven 
in  [Cod72].  The  syntax  for  QUEL  is  found  in  [Sto76]  and 
[W008I] ,  and  is  summarized  in  graphical  form  in  the  Appen¬ 
dix,  a  subset  of  [RTI84] .  Although  QUEL  is  relationally 
complete,  it  is  not  always  possible  to  represent  each 
algebraic  operation  with  a  single  QUEL  statement.  The  proof 
that  the  QUEL  sequences  presented  in  this  chapter  accomplish 
the  relational  operations  is  not  presented  but  can  be 
derived  from  the  material  presented  in  [U1182] . 


Projection 


Given  a  relation,  R,  with  n  attributes,  it  is  possible 
to  derive  a  relation,  S,  such  that  its  tuples  are  defined  on 
a  vertical  subset  of  attributes,  A1,A2,  .  .  .  Ak,  belonging 
to  R.  The  projection  operation  can  be  accomplished  in  QUEL 


as  shown  in  Figure  16. 
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(pi,  nut,  red,  12,  London,  si.  Smith,  20,  London) 
(pi,  nut,  red,  12,  London,  s4,  Clark,  20,  London) 


(P2, 

bolt,  green, 

17 

,  Paris, 

s2, 

Jones , 

10, 

Paris) 

(p2. 

bolt,  green. 

17 

,  Paris, 

S3, 

Blake, 

30, 

Paris) 

(p4. 

screw,  red. 

14, 

London, 

si. 

Smith , 

20, 

London) 

(p4, 

screw,  red, 

14, 

London , 

s4. 

Clark, 

20, 

London) 

(p5. 

cam, 

blue. 

12 

,  Paris, 

s2. 

Jones , 

10, 

Paris) 

(P5, 

cam, 

blue, 

12 

,  Paris, 

s3. 

Blake, 

30, 

Paris) 

(P6, 

cog. 

red , 

19, 

London , 

si , 

Smith, 

20, 

London) 

(P6, 

cog. 

red. 

19, 

London , 

s4 , 

Clark , 

20, 

London) 

Figure  14:  Tuples  of  PART  JOIN  SUPPLIER  WHERE 
PART. city  =  SUPPLIER. city. 


For  example,  PART  NJOIN  SUPPLIER  OVER  city  yields  the 
tuples  shown  in  Figure  15  which  is  similar  to 

PART  JOIN  SUPPLIER  WHERE  PART. city  =  SUPPLIER. city 
except  that  the  attribute,  city,  only  appears  once. 


(pi,  nut,  red,  12,  London,  si,  Smith,  20) 

(pi,  nut,  red,  12,  London,  s4,  Clark,  20) 

(p2,  bolt,  green,  17,  Paris,  s2 ,  Jones,  10) 
(p2,  bolt,  green,  17,  Paris,  s3,  Blake,  30) 
(p4 ,  screw,  red,  14,  London,  si.  Smith,  20) 
( p 4 ,  screw,  red,  14,  London,  s4,  Clark,  20) 
(p5,  cam,  blue,  12,  Paris,  s2 ,  Jones,  10) 

(p5,  cam,  blue,  12,  Paris,  s3,  Blake,  30) 

(p6,  cog,  red,  19,  London,  si,  Smith,  20) 

(p6 ,  cog,  red,  19,  London,  s4,  Clark,  20) 


Figure  15: 


Tuples  of  PART  NJOIN  SUPPLIER  OVER  city. 


(si.  Smith  20,  London,  pi,  nut,  red,  12,  London) 

(si,  Smith  20,  London,  p2,  bolt,  green,  17,  Paris) 

(si.  Smith  20,  London,  p3 ,  screw,  blue,  17,  Rome) 

(si.  Smith  20,  London,  p4,  screw,  red,  14,  London) 

(si,  Smith  20,  London,  p5 ,  cam,  blue,  12,  Paris) 

(si.  Smith  20,  London,  p6,  cog,  red,  19,  London) 

(s2,  Jones,  10,  Paris,  pi,  nut,  red,  12,  London) 

(s2,  Jones,  10,  Paris,  p2,  bolt,  green,  17,  Paris) 

(s2,  Jones,  10,  Paris,  p3 ,  screw,  blue,  17,  Rome) 

(s2,  Jones,  10,  Paris,  p4,  screw,  red,  14,  London) 

(s2,  Jones,  10,  Paris,  p5,  cam,  blue,  12,  Paris) 

(s2,  Jones,  10,  Paris,  p6,  cog,  red,  19,  London) 

(s3,  Blake,  30,  Paris,  pi,  nut,  red,  12,  London) 

(s3,  Blake,  30,  Paris,  p2,  bolt,  green,  17,  Paris) 

(s3,  Blake,  30,  Paris,  p3 ,  screw,  blue,  17,  Rome) 

(s3,  Blake,  30,  Paris,  p4,  screw,  red,  14,  London) 

(s3,  Blake,  30,  Paris,  p5,  cam,  blue,  12,  Paris) 

(s3,  Blake,  30,  Paris,  p6,  cog,  red,  19,  London) 

(s4,  Clark,  20,  London,  pi,  nut,  red,  12,  London) 

(s4,  Clark,  20,  London,  p2,  bolt,  green,  17,  Paris) 

(s4,  Clark,  20,  London,  p3 ,  screw,  blue,  17,  Rome) 

(s4,  Clark,  20,  London,  p4,  screw,  red,  14,  London) 

(s4,  Clark,  20,  London,  p5 ,  cam,  blue,  12,  Paris) 

(s4,  Clark,  20,  London,  p6,  cog,  red,  19,  London) 

(s5,  Adams,  30,  Athens,  pi,  nut,  red,  12,  London) 

(s5,  Adams,  30,  Athens,  p2,  bolt,  green,  17,  Paris) 

(s5,  Adams,  30,  Athens,  p3 ,  screw,  blue,  17,  Rome) 

(s5,  Adams,  30,  Athens,  p4,  screw,  red,  14,  London) 

(s5,  Adams,  30,  Athens,  p5,  cam,  blue,  12,  Paris) 

(s5,  Adams,  30,  Athens,  p6,  cog,  red,  19,  London) 


Figure  13:  Tuples  of  SUPPLIER  TIMES  PART. 


(pl. 

nut,  red,  12 

,  London , 

si.  Smith, 

20, 

London) 

(pi. 

nut,  red,  12 

,  London, 

s2,  Jones, 

10, 

Paris) 

(pl. 

nut,  red,  12 

,  London , 

s3,  Blake, 

30, 

Paris) 

(pl. 

nut,  red,  12 

,  London , 

s4,  Clark, 

20, 

London) 

(pl. 

nut,  red,  12 

,  London, 

s5,  Adams, 

30, 

Athens) 

(p2. 

bolt,  green, 

17, 

Paris 

,  si,  Smith 

,  20 

,  London) 

(p2. 

bolt,  green, 

17, 

Paris 

,  s2,  Jones 

,  10 

,  Paris) 

(p2 , 

bolt,  green. 

17, 

Paris 

,  s3,  Blake 

,  30 

,  Paris) 

(p2 , 

bolt,  green, 

17, 

Paris 

,  s4,  Clark 

,  20 

,  London) 

( p2 , 

bolt,  green. 

17, 

Paris 

,  s5,  Adams 

,  30 

,  Athens) 

( p  3  , 

screw,  blue, 

17, 

Rome , 

si.  Smith, 

20, 

London) 

(p3. 

screw,  blue, 

17, 

Rome , 

s2,  Jones, 

10, 

Paris) 

<P3  , 

screw,  blue, 

17, 

Rome , 

s3 ,  Blake, 

30, 

Paris) 

(p3  , 

screw,  blue, 

17, 

Rome , 

s4,  Clark, 

20, 

London) 

(p3  , 

screw,  blue, 

17, 

Rome , 

s5,  Adams, 

30, 

Athens) 

( P  4 , 

screw. 

red, 

14, 

London , 

si. 

Smith, 

20, 

London) 

(p4. 

screw. 

red. 

14, 

London , 

s2 , 

Jones , 

10, 

Paris) 

(p4. 

screw. 

red. 

14, 

London, 

s3, 

Blake , 

30, 

Paris) 

(p4 , 

screw. 

red , 

14, 

London , 

s4. 

Clark , 

20, 

London) 

(p4. 

screw. 

red. 

14, 

London, 

s5 , 

Adams , 

30, 

Athens) 

(p5. 

cam, 

blue,  12,  Paris, 

si , 

Smith, 

20, 

London) 

(p5. 

cam. 

blue,  12,  Paris, 

s2, 

Jones, 

10, 

Paris) 

(p5 , 

cam. 

blue,  12,  Paris, 

s3 , 

Blake , 

30, 

Paris) 

(p5. 

cam. 

blue,  12,  Paris, 

s4 , 

Clark, 

20, 

London) 

(P5  , 

cam, 

blue,  12,  Paris, 

s5 , 

Adams , 

30, 

Athens) 

(p6, 

cog. 

red,  19,  London, 

si. 

Smith, 

20, 

London) 

(p6  , 

cog, 

red,  19,  London, 

s2. 

Jones , 

10, 

Paris) 

(p6. 

cog. 

red,  19,  London, 

s3 , 

Blake , 

30, 

Paris) 

(p6. 

cog. 

red,  19,  London, 

s4 , 

Clark, 

20, 

London) 

(p6. 

cog. 

red,  19,  London, 

s5, 

Adams , 

30  , 

Athens) 

Figure  12:  Tuples  of  PART  TIMES  SUPPLIER. 


Conversely,  SUPPLIER  TIMES  PART  contains  the  tuples 
shown  in  Figure  13. 

Note  that  the  product  is  not  commutative  since  PART 
TIMES  SUPPLIER  is  not  equal  to  SUPPLIER  TIMES  PART. 

Join 

The  join  of  R  and  S,  denoted  by  R  JOIN  S,  is  a  subset 
of  the  product  obtained  by  selecting  tuples  from  the  product 
which  satisfy  the  join  predicate.  The  join  predicate  is  a 
qualification  statement  which  contains  at  least  one  pair  of 
attributes  from  each  of  the  operand  relations,  connected  by 
one  of  the  relational  comparison  operators:  equal,  greater 
than,  less  than,  not  equal,  less  than  or  equal,  and  greater 
than  or  equal. 

For  example,  PART  JOIN  SUPPLIER  WHERE  PART. city  = 
SUPPLIER. city  yields  the  tuples  shown  in  Figure  14. 

Note  that  the  attribute  city  appears  twice  in  the 
result. 


Natural  Join 

The  natural  join  of  R  and  S,  denoted  by  R  NJOIN  S,  is  a 
special  case  of  the  join  operation.  The  natural  join 
differs  from  the  join  in  that  only  the  equality  comparison 
is  used  in  the  predicate  and  duplicate  attributes  referenced 
in  the  predicate  are  omitted — all  the  other  columns  not 
specifically  referenced  in  the  predicate  are  not  omitted. 
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For  example,  PART  MINUS  PARTI  yields  the  tuples  shown 
in  Figure  10.  Conversely,  PARTI  MINUS  PART  yields  an  empty 
set,  since  all  the  tuples  from  PARTI  also  exist  in  PART. 


(pi,  nut,  red,  12,  London) 

(p4,  screw,  red,  14,  London) 

(p5,  cam,  blue,  12,  Paris) 

Figure  10:  Tuples  of  PART  MINUS  PARTI. 


Product 

Given  R  and  S,  relations  of  arity  and  k 2  respec¬ 
tively,  the  product  of  R  and  S,  denoted  R  TIMES  S,  is  a 
relation  of  arity  k^  +  k 2  containing  the  set  of  tuples  whose 
first  kj  components  form  a  tuple  in  R  and  whose  last  k2 
components  form  a  tuple  in  S. 

For  example,  if  the  relation  SUPPLIER  defined  as: 

SUPPLIER (supno,  sname,  status,  city) 
contains  the  tuples  shown  in  Figure  11,  then  PART  TIMES 
SUPPLIER  contains  the  tuples  shown  in  Figure  12. 


(si. 

Smith, 

20, 

London) 

( s2 , 

Jones , 

10, 

Paris) 

(S3, 

Blake , 

30, 

Paris) 

(s4 , 

Clark, 

20, 

London) 

( s  5 , 

Adams , 

30  , 

Athens) 

Figure  11:  Tuples  of  SUPPLIER. 
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Intersection 

If  R  and  S  are  two  union  compatible  relations  of 
arity  k,  then  the  intersection  of  R  and  S,  denoted  by 
R  INTERSECTION  S,  is  the  set  of  k-tuples  which  belong  to 
both  R  and  S. 

For  example,  if  PART 3  contains  the  tuples  shown  in 
Figure  8  then  PART2  INTERSECTION  PART3  yields  the  tuples 
shown  in  Figure  9. 


(pi,  nut,  red,  12,  London) 

(p5,  cam,  blue,  12,  Paris) 

(p7,  shaft,  black,  20,  Brussels) 
(p8,  lug,  brown,  15,  Paris) 

(plO ,  washer,  red,  10,  London) 

Figure  8:  Tuples  of  PART3 . 


(pi,  nut,  red,  12,  London) 

(p5,  cam,  blue,  12,  Paris) 

Figure  9:  Tuples  of  PART2  INTERSECTION  PART 3 . 

Difference 

If  R  and  S  are  two  union  compatible  relations  of 
arity  k,  then  the  difference  between  relations  R  and  S, 
denoted  R  MINUS  S,  is  the  set  of  all  tuples  which  are 
contained  in  R  but  not  in  S.  Conversely,  S  MINUS  R,  is  th 
set  of  all  tuples  which  are  contained  in  S  but  not  in  R. 
follows  that  DIFFERENCE  is  non-commutative ,  i.e.  S  MINUS 
P.  4  R  MINUS  S. 
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when  referencing  attributes  of  the  same  name  from  different 
relations. 


Union 

The  union  of  relations  R  and  S,  denoted  R  UNION  S,  is 
the  set  of  tuples  that  belong  to  R,  S,  or  both.  The  union 
operation  can  only  be  performed  when  both  relations  are 
union  compatible,  that  is,  they  have  equal  arity  and  corre¬ 
sponding  attributes  are  drawn  on  compatible  domains.  The 
result  has  the  same  arity  as  the  operands  and  its  attribute 
names  are  the  same  as  those  found  in  the  first  operand. 

For  example,  if  PARTI  =  PART  WHERE  PART. weight  >  15 
contains  the  tuples  shown  in  Figure  6a  and  PART 2  =  PART 
WHERE  PART. weight  <  15  contains  the  tuples  shown  in 
Figure  6b,  then  PARTI  UNION  PART 2  has  the  tuples  shown  in 
rigure  7. 


(p2,  bolt,  green,  17,  Paris) 
(p3,  screw,  blue,  17,  Rome) 
(p6 ,  cog,  red,  19,  London) 

(a) 


(pi,  nut,  red,  12  London) 
(p4,  screw,  red,  14,  London) 
(p5,  cam,  blue,  12,  Paris) 

(b) 


Figure  6:  Tuples  of  PARTI  and  PART2 . 


(p2,  bolt,  green,  17,  Paris) 

( p3 ,  screw,  blue,  17,  Rome) 

(p6,  cog,  red,  19,  London) 

(pi,  nut,  red,  12,  London) 

(p4,  screw,  red,  14,  London) 

(p5,  cam,  blue,  12,  Paris) 

Figure  7:  Tuples  of  PARTI  UNION  PART 2 . 
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For  example,  PART [color,  partno]  yields  the  tuples 
shown  in  Figure  4. 

(red,  pi) 

(green,  p2) 

(blue,  p3 ) 

(red,  p4) 

(blue,  p5) 

(red,  p6 ) 

Figure  4:  Tuples  of  PART[color,  partno]. 

Selection 

Given  a  relation  R  of  arity  k,  a  horizontal  subset  can 
be  defined  on  R,  such  that  the  occurrences  of  certain 
components  of  the  subset  satisfy  a  logical  predicate  which 
references  only  attributes  of  R  [U1182] .  The  selection  on  R 
is  denoted  R  WHERE  p,  in  which  p  is  the  logical  predicate 
and  is  also  known  as  the  selection  criterion. 

For  example,  using  the  PART  relation,  PART  WHERE  PART, 
weight  <  15  corresponds  to  the  tuples  of  parts  which  weigh 
less  that  15  units,  i.e.  the  tuples  shown  in  Figure  5. 

(pi,  nut,  red,  12,  London) 

(p4,  screw,  red,  14,  London) 

(p5,  cam,  blue,  12,  Paris) 

Figure  5:  Tuples  of  PART  WHERE  PART. weight  <  15. 

Note  that  the  attributes  of  a  relation  are  referenced 
using  qualified  names,  i.e.  the  relation  name,  followed  by  a 
period,  followed  by  the  actual  attribute  name.  The  purpose 
of  using  qualified  names  is  to  minimize  ambiguity,  such  as. 
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arity  5,  could  consist  of  the  tuples  shown  in  Figure  3  and 
is  expressed  empirically  as: 

PART  (partno,  pname,  color,  weight,  city) 
where  PART  is  defined  on  the  f-mappings  of  Figure  2. 

(pi,  nut,  red,  12,  London) 

(p2,  bolt,  green,  17,  Paris) 

(p3,  screw,  blue,  17,  Rome) 

(p4,  screw,  red,  14,  London) 

(p5,  cam,  blue,  12,  Paris) 

(p6,  cog,  red,  19,  London) 

Figure  3:  Tuples  of  relation  PART. 

The  algebraic  operators  which  are  used  to  manipulate 
relations  are:  selection,  projection,  union,  intersection, 
difference,  product,  join,  and  natural  join.  Depending  on 
the  number  of  operands,  these  algebraic  operators  are 
classified  into  unary  and  binary.  Projection  and  selection 
are  unary,  i.e.  they  use  one  operand.  The  other  operators 
are  binary,  i.e.  they  use  two  operands. 


Projection 


Given  a  relation  R  of  arity  k,  the  projection  of  one  or 


more  components  of  R,  denoted  by  Rfi^i^,  .  .  .  imI  ,  where 

m  <  k,  is  the  set  of  m-tuples,  a,a0  .  .  .a  such  that  there 
—  12  m 

is  some  k-tuple  b..b.,  .  .  .  b,  in  R  for  which  a.  =  b.  .  for 

12  k  3  lj 

j  =  1,  2,  .  .  .  ,  m  [U1182].  In  simpler  terms,  we  take  a 


relation,  remove  some  of  its  columns  and/or  permute  some  of 
the  remaining  columns. 


CHAPTER  II 
RELATIONAL  ALGEBRA 


The  purpose  of  this  chapter  is  to  present  the 
relational  algebra  notation  that  will  be  used  in  future 
discussions  leading  to  the  translation  algorithm.  The 
presentation  of  this  notation  is  meant  to  be  brief  and  does 
not  cover  all  aspects  of  the  relational  algebra. 

A  tuple  is  defined  in  [U1182]  as  a  mapping  from  attri¬ 
bute  names  to  values  in  the  domains  of  the  attributes.  An 
example  of  a  tuple  is: 

(pi,  nut,  red,  12,  London) 

and  the  mapping  f  which  defines  that  tuple  is  shown  in 
Figure  2 . 


f(partno)  =  pi 
f(pname)  =  nut 
f (color)  =  red 
f (weight)  =  12 
f(city)  =  London 

Figure  2:  The  mapping  f. 


Tuples  can  be  grouped  into  relations.  A  relation  is 
defined  in  [U1I82]  as  a  set  of  k-tuples,  where  k  is  fixed 
and  is  known  as  the  arity  of  the  relation.  The  arity  of  the 
tuple  in  the  above  example  is  5.  A  relation,  PART,  of 


Network  queries  made  in  a  global  data  manipulation  language 
( GDML )  are  transformed  into  relational  algebra  query  trees 
which  are  in  turn  partitioned  for  processing  by  the  perti¬ 
nent  subsystems.  The  translation  algorithm,  to  be  described 
in  this  paper,  maps  the  relational  algebra  into  QUEL.  The 
significance  of  this  work  is  that  it  lays  a  foundation  to 
develop  other  translation  algorithms  which  can  be  used  to 
map  the  GDML  into  other  subsystems.  Relevant  work  dealing 
with  this  type  of  translation  is  found  in  [Cer  in  press] . 
Material  which  justifies  the  use  of  relational  algebra  as 
IMDAS '  intermediate  representation  can  be  found  in  [Klu80] 
and  [Su81] . 

The  work  is  organized  as  follows.  The  material  in 
Chapter  II,  based  on  [Cod72] ,  [Dat82] ,  and  [U1182] ,  intro¬ 
duces  the  notation  and  definitions  of  the  relational 
algebra.  Chapter  III  develops  the  QUEL  statements  which 
accomplish  all  the  algebraic  operators  using  [U1182]  and 
[W008I].  Chapter  IV  develops  a  node-by-node  algorithm  to 
translate  relational  algebra  query  trees  into  QUEL.  Using 
[Chu82],  Chapter  V  discusses  minimization  and  presents  an 
optimized  translation  algorithm.  Chapter  VI  discusses 
implementation  of  the  latter  algorithm.  Chapter  VII  gives 
the  concluding  remarks  about  this  work. 


Intersection 


Given  two  relations,  R  and  S,  which  are  union  compati¬ 
ble,  it  is  possible  to  derive  a  relation,  T,  such  that  all 
its  tuples  exist,  both,  in  R  and  in  S.  The  intersection 
operation  can  be  accomplished  in  QUEL  as  shown  in  Figure  22. 
Note  that  A1 ,  A2 ,  .  .  .  Ai  are  the  attributes  of  R,  and  Bl, 
B2,  .  .  .  Bi  are  the  attributes  of  S. 


RANGE  OF  r  IS  R 
RANGE  OF  s  IS  S 

RETRIEVE  INTO  T  ( r . ALL )  WHERE  r.Al  =  S.B1 
AND  r.A2  =  s.B2  .  .  .  AND  r.Ai  =  s.Bi 

Figure  22:  Intersection  in  QUEL. 


The  QUEL  sequence  for  PART 2  INTERSECTION  PART 3  is  shown 
in  Figure  23. 


RANGE  OF  r  IS  PARTI 
RANGE  OF  s  IS  PART3 
RETRIEVE  INTO  TEMP  ( r . ALL ) 

WHERE  r.partno  =  s.partno 
AND  r.pname  =  s.pname 
AND  r. color  =  s. color 
AND  r. weight  =  s. weight 
AND  r.city  =  s.city 

Figure  23:  Intersection  example  in  QUEL. 


Note  that  comparisons  using  all  corresponding  attribute 
name  pairs  are  built  into  the  WHERE  clause  to  test  for  tuple 
equivalency . 


Difference 

Given  two  relations,  R  and  S,  which  are  union  compati¬ 
ble,  it  is  possible  to  derive  a  difference  relation,  T,  such 
that  all  its  tuples,  except  those  tuples  which  already  exist 
in  S,  are  taken  from  R.  The  difference  operation  can  be 
accomplished  in  QUEL  as  shown  in  Figure  24.  Note  that  Al , 

A2 ,  .  .  .  Ai  are  the  attributes  of  R,  and  Bl,  B2,  .  .  .  Bi 
are  the  attributes  of  S. 


RANGE  OF  r  IS  R 
RANGE  OF  s  IS  S 
RETRIEVE  INTO  T  ( r . ALL ) 

RANGE  OF  t  IS  T 

DELETE  t  WHERE  t.Al  =  s.Bl 

AND  t.A2  =  s.B2  .  .  .  AND  t.Ai  =  s.Bi 

Figure  24:  Difference  in  QUEL. 


The  QUEL  sequence  for  PART  MINUS  PARTI  is  shown  in 
Figure  25. 


RANGE  OF  r  IS  PART 
RANGE  OF  s  IS  PARTI 
RETRIEVE  INTO  TEMP  ( r . ALL) 

RANGE  OF  t  IS  TEMP 

DELETE  t  WHERE  t.partno  =  s.partno 

AND  t.pname  =  s.pname 

AND  t. color  *  s. color 

AND  t. weight  =  s. weight 

AND  t.city  =  s.city 

Figure  25:  Difference  example  in  QUEL. 


Note  that  tuple  equivalency  is  achieved  the  same  way  as 


in  the  intersection  operation. 
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Product 

Given  two  relations,  R  and  S,  it  is  possible  to  derive 
a  relation,  T,  such  that  all  its  tuples  are  formed  by  making 
all  possible  combinations  taking  one  tuple  from  R  and  one 
tuple  from  S  and  concatenating  them.  The  product  operation 
can  be  accomplished  in  QUEL  as  shown  in  Figure  26. 

RANGE  OF  r  IS  R 
RANGE  OF  s  IS  S 
RETRIEVE  INTO  T  ( r . ALL ,  s . ALL) 

Figure  26:  Product  in  QUEL. 

The  QUEL  sequence  for  PART  TIMES  SUPPLIER  is  shown  in 
Figure  27. 


RANGE  OF  r  IS  PART 
RANGE  OF  s  IS  SUPPLIER 
RETRIEVE  INTO  TEMP  ( r . ALL ,  s.supno, 
s.sname,  s. status,  city2  =  s.city) 

Figure  27:  Product  example  in  QUEL. 

Note  that  when  duplicate  attribute  names  are  present, 
e.g.  city,  they  must  be  aliased  to  a  unique  name  in  the 
result.  This  is  because  INGRES  does  not  allow  duplicate 
attribute  names  within  the  same  relation.  If  all  the 
attribute  names  of  both  relations  had  been  different,  the 
target  list  in  the  RETRIEVE  statement  would  have  been: 

(r . ALL,  s . ALL) . 

Conversely,  the  QUEL  sequence  for  SUPPLIER  TIMES  PART 


is  shown  in  Figure  28. 
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RANGE  OF  r  IS  SUPPLIER 
RANGE  OF  s  IS  PART 

RETRIEVE  INTO  TEMP  ( r . ALL ,  s.partno, 
s.pname,  s. color,  s. weight,  city2  =  s.city) 

Figure  28:  Product  example  in  QUEL. 


The  RETRIEVE  statement  could  be  stated  equivalently  as: 
RETRIEVE  INTO  TEMP  (r.supno,  r.sname,  r. status,  city2  = 
r.city,  s . ALL ) . 


Join 

Given  two  relations,  R  and  S,  it  is  possible  to  derive 
a  relation,  T,  such  that  all  its  tuples  belong  to  a  subset 
of  the  cartesian  product  which  satisfy  a  join  condition,  P. 
P  is  a  logical  predicate  which  references  at  least  one  pair 
of  attributes,  one  from  each  relation,  and  compares  them 
using  any  of  the  relational  operators  as  described  in 
Chapter  II.  The  join  operation  can  be  accomplished  in  QUEL 
as  shown  in  Figure  29. 


RANGE  OF  r  IS  R 
RANGE  OF  s  IS  S 

RETRIEVE  INTO  T  ( r . ALL ,  S.ALL)  WHERE  P 
Figure  29:  Join  in  QUEL. 

The  QUEL  sequence  for  PART  JOIN  SUPPLIER  WHERE 
PART. city  =  SUPPLIER. city  is  shewn  in  Figure  30. 


RANGE  OF  r  IS  PART 
RANGE  OF  s  IS  SUPPLIER 
RETRIEVE  INTO  TEMP  ( r . ALL ,  s.supno, 
s.sname,  s. status,  city 2  =  s.city) 
WHERE  r.city  =  s.city 

Figure  30:  Join  example  in  QUEL. 


As  in  the  product,  duplicate  attribute  names  appear 
aliased  in  the  result. 

Natural  Join 

Given  two  relations,  R  and  S,  it  is  possible  to  derive 
a  relation,  T,  such  that  all  its  tuples  belong  to  the  subset 
of  the  cartesian  product  which  satisfy  a  join  condition  P. 

P  is  a  logical  predicate  which  references  at  least  one  pair 
of  attributes,  one  from  each  relation,  and  compares  them 
using  only  the  equality  operator.  Attribute  names  refer¬ 
enced  in  the  join  condition  only  appear  once  in  the  result. 
The  natural  join  operation  can  be  accomplished  in  QUEL  as 
shown  in  Figure  31. 


RANGE  OF  r  IS  R 
RANGE  OF  s  IS  S 

RETRIEVE  INTO  TEMP  ( r . ALL ,  s . ALL ) 
V7HERE  P 

Figure  31:  Natural  Join  in  QUEL. 


The  QUEL  sequence  for  PART  NJOIN  SUPPLIER  OVER  city  is 


shown  in  Fiaure  32. 
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translation  can  be  performed  without  accessing  the  target 
subsystem. 

The  algorithm  has  two  main  components:  tree  processing 
and  operation  mapping.  Tree  processing  determines  the 
correct  order  of  execution  for  the  operations  contained  in 
the  query  tree.  Operation  mapping  converts  algebraic 
operations  specified  in  the  query  tree  into  a  sequence  of 
QUEL  statements.  It  is  based  on  the  QUEL  sequences  that 
were  presented  in  Chapter  III. 


Tree  Processing 

Each  operation  specified  in  a  query  tree  node  will  use 
as  operands  relations  contained  in  the  target  subsystem 
and/or  results  of  operations  from  preceding  nodes.  The 
resulting  QUEL  sequence  provides  a  way  to  store  intermediate 
results  from  lower  level  nodes  so  they  can  reused  at  higher 
levels.  This  is  accomplished  through  the  use  of  temporary 
relations.  For  the  purposes  of  this  discussion,  temporary 
relations  will  be  referred  to  as  TEMPx  relations,  x  being  a 
sequence  number  which  is  generated  by  the  translator.  These 
TEMPx  relations  are  destroyed  at  some  point  in  the  execution 
of  the  QUEL  sequence,  when  they  are  no  longer  needed, 
e.g.  prior  to  translation  of  another  query  packet. 

The  correct  order  of  execution  is  determined  by  exam¬ 


ining  the  sibbling  pointers  at  each  node.  This  order  is 
noted,  e.g.  using  a  stack,  and  then  used  to  generate 
correctly  ordered  QUEL  statements.  Figure  33a  shows  an 


CHAPTER  IV 

TRANSLATION  ALGORITHM 


The  algorithm  which  describes  the  steps  necessary  to 
convert  relational  algebra  query  trees  into  QUEL  statements 
is  discussed  in  this  chapter.  The  algorithm  presented,  can 
be  used  to  implement  a  translator  program  which  generates 
QUEL  statements  that  are  executable  on  the  INGRES  subsystem. 
The  translator  takes  as  input  a  query  packet  file  in  a 
predefined  format  and  generates  the  proper  sequence  of  QUEL 
commands  to  be  executed  by  the  target  subsystem.  The  query 
packet  specifies  a  series  of  algebraic  operations  organized 
as  a  binary  tree  structure  in  which  the  order  of  precedence 
is  from  bottom  to  top. 

The  way  in  which  the  query  tree  is  stored  is  arbitrary 
and  depends  on  implementation  design,  thus  it  is  not 
discussed  here.  It  is  important  to  note,  however,  that  it 
contains  pointers  which  allow  the  translator  to  discern  the 
relationships  between  all  its  nodes.  It  must  also  contain 
information  which  defines  the  algebraic  operations  to  be 
performed  at  each  node,  i.e.  operation,  operands,  attri¬ 
butes,  conditions,  etc.  It  must  also  include  dictionary 
information  about  all  operands,  so  that  the  entire 
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example  of  a  query  tree.  Figures  33b  and  33c  show  possible 
orders  of  execution.  The  difference  between  the  stacks  is 
the  side  chosen  first  during  traversal.  The  left  branch  is 
chosen  first  in  Figure  33b,  while  the  right  branch  is  chosen 
first  in  Figure  33c. 


(a) 


Figure  33s  Query  tree  and  execution  stacks. 


Operation  Mapping 

This  portion  of  the  algorithm  translates  each  node 
operation  into  the  corresponding  sequence  of  QUEL  state¬ 
ments.  In  order  to  generate  the  appropriate  sequence 
for  a  given  node,  the  algorithm  uses  predefined  templates 
for  each  algebraic  operation,  i.e.  each  algebraic  operator 
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has  a  fixed  template  associated  with  it.  These  templates 
closely  resemble  the  sequences  presented  in  Chapter  III. 

To  translate  a  node  operation  into  the  corresponding 
QUEL  sequence,  the  template  is  coped  into  a  work  buffer  and 
the  details  which  are  particular  to  that  node,  i.e.  sequence 
numbers,  relation  names,  attribute  names ,  etc.,  are  inserted 
into  the  appropriate  fields  of  the  buffer,  thus  building  up 
the  appropriate  sequence  for  that  node.  Once  the  sequence 
has  been  built,  the  contents  of  the  buffer  can  be  saved, 
e.g.  appended  to  an  output  file. 

Having  a  template  for  each  operation  defines  a  unique 
series  of  programmed  steps  required  to  translate  the  query 
tree  to  QUEL  on  a  node-per-node  basis.  This  method  is 
relatively  simple  to  implement  although  it  may  not  be 
optimal.  The  node-per-node  translation  algorithm  is  pre¬ 
sented  to  acquaint  the  reader  with  the  main  details  of  the 
translation  process.  An  optimal  method  of  translation  is 
presented  in  the  next  chapter. 

In  the  coming  paragraphs,  the  following  algebraic 
operators  are  mapped  into  QUEL:  Union,  Intersection, 
Difference,  Selection,  Projection,  Product,  Join,  and 
Natural  Join. 


Projection 


The  steps  necessary  to  translate  a  node  containing  a 


projection  operation  are  shown  in  Figure  34. 
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Load  Projection  template  into  buffer  area; 

RANGE  OF  L  IS  relname 
RETRIEVE  INTO  TEMPx  (L.attrlist) 

Generate  sequence  number  and  insert  into  TEMPx  field; 

Insert  relation  name  into  relname  field; 

Insert  attribute  names  into  target  list,  using 
L-prefixes  on  all  the  attribute  names  of  the  relation; 

Figure  34:  Pseudo-code  for  Projection  node. 

Selection 

The  steps  necessary  to  translate  a  node  containing  a 
selection  operation  are  shown  in  Figure  35. 

Load  Selection  template  into  buffer  area; 

RANGE  OF  L  IS  relname 

RETRIEVE  INTO  TEMPx  (L. ALL)  WHERE  P 

Generate  sequence  number  and  insert  into  TEMPx  field; 

Insert  relation  name  into  relname  field; 

Insert  selection  predicate  into  P  field; 

Figure  35:  Pseudo-code  for  Selection  node. 


Union 

The  steps  necessary  to  translate  a  node  containing  a 
projection  operation  are  presented  in  Figure  36. 


29 


Load  Union  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 
RETRIEVE  INTO  TEMPx  ( L . ALL ) 

APPEND  TO  TEMPx  ( R . ALL ) 

RANGE  OF  X  IS  TEMPx 
RETRIEVE  INTO  TEMPy  (X. ALL) 

Generate  sequence  number  and  insert  into  TEMPx  and 
TEMPy  fields; 

Insert  relation  names  into  relnamel  and  relname2 
fields; 

Figure  36:  Pseudo-code  for  Union  node. 


Intersection 


The  steps  necessary  to  translate  a  node  containing  a 
union  operation  are  shown  in  Figure  37. 


Load  Intersection  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 
RETRIEVE  INTO  TEMPx  ( L . ALL ) 

WHERE  L.attrl  =  R.attrl  AND  L.attr2  =  R.attr2 
AND  .  .  .  L.attrN  =  R.attrN 

Generate  sequence  number  and  insert  into  TEMPx  field; 

Insert  relation  names  into  relnamel  and  relname2 
fields; 

Build  WHERE  clause  using  L-prefixes  with  the  attribute 
names  of  the  left  relation  and  R-prefixes  with  the 
attribute  names  of  the  right  relation; 

Figure  37:  Pseudo-code  for  Intersection  node. 


Difference 


The  steps  necessary  to  translate  a  node  containing  a 


difference  operation  are  shown  in  Figure  38. 
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Load  Difference  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 
RETRIEVE  INTO  Tempx  ( L . ALL ) 

RANGE  OF  L  IS  TEMPX 

DELETE  L  WHERE  L.attrl  =  R.attrl  AND  L.attr2  = 
R.attr2  .  .  .  AND  L.attrN  =  R.attrN 

Generate  sequence  number  and  insert  into  TEMPx  fields; 

Insert  relation  names  into  fields  relnamel  and 
relname2 ; 

Build  WHERE  clause  using  L-prefixes  with  the  attribute 
names  of  the  left  relation  and  R-prefixes  with  the 
attribute  names  of  the  right  relation; 

Figure  38:  Pseudo-code  for  Difference  node. 


Product 


The  steps  necessary  to  translate  a  node  containing  a 
product  operation  are  shown  in  Figure  39. 


Load  Product  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 

RETRIEVE  INTO  TEMPx  (L.attrlistl ,  R.attrlist2) 

Generate  sequence  number  and  insert  into  TEMPx  field; 

Insert  relation  names  into  relnamel  and  relname2 
fields; 

Build  target  list  using  L-prefixes  with  all.*-the  attri¬ 
bute  names  of  the  left  relation  and  R-prefixes  with  all 
the  attribute  names  of  the  right  relation; 

For  duplicate  attribute  names,  generate  alias  names  and 
enter  them  into  the  aliases  table; 

Figure  39:  Pseudo-code  for  Product  node. 


31 


Join 


The  steps  necessary  to  translate  a  node  containing  a 
join  operation  are  shown  in  Figure  40. 


Load  Join  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 

RETRIEVE  INTO  TEMPx  (L . attrlistl ,  R.attrlist2) 
WHERE  PT 

Generate  sequence  number  and  insert  into  TEMPx  field; 

Insert  relation  names  into  relnamel  and  relname2 
fields; 

Build  target  list  using  L-prefixes  with  all  the  attri¬ 
bute  names  of  the  left  relation  and  R-prefixes  with  all 
the  attribute  names  of  the  right  relation; 

For  duplicate  attribute  names  generate  alias  names  and 
enter  them  into  the  aliases  table; 

Build  WHERE  clause  using  the  join  predicate  and  add 
L-prefixes  to  the  attribute  names  of  the  left  relation 
and  R-prefixes  to  the  attribute  names  of  the  right 
relation; 

Figure  40.  Pseudo-code  for  Join  node. 


Natural  Join 


The  steps  necessary  to  translate  a  node  containing  a 


natural  join  operation  are  shown  in  Figure  41. 
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Join 


The  steps  necessary  to  translate  a  Type  II  node  con¬ 
taining  the  join  operation  are  shown  in  Figure  53. 


Load  Difference  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 

RETRIEVE  INTO  TEMPx  (L. attrlist. )  WHERE  PT 

-L  J_j 

RANGE  OF  T  IS  TEMPX 

DELETE  T  WHERE  T.a ttrL1  =  R.attrR1  AND  T.attrL2  = 

R.attr^^  .  .  .  AND  L.attrT  =  R.attr.,  AND  P„ 

R2  Ln  Rn  R 

Generate  sequence  number  and  insert  it  into  the  TEMPx 
fields; 

Insert  relation  names  into  relnamel  and  relname  2 
fields; 

Build  target  list  for  RETRIEVE  statement  using 
L-prefixes  with  all  the  attribute  names  of  the  left 
topmost  attribute  list; 

Build  first  WHERE  clause  using  using  L-prefixes  with 
all  the  attribute  names  in  the  left  composite 
predicate; 

Build  second  WHERE  clause  using  T-prefixes  on  all  the 
attribute  names  of  the  left  topmost  attribute  list  and 
R-prefixes  on  all  the  attribute  names  of  the  right 
topmost  attribute  list; 

Add  R-prefixes  to  the  attribute  names  of  the  right 
composite  predicate  and  append  to  the  second  WHERE 
clause; 


Figure  51:  Pseudo-code  for  Type  II 
node  with  Difference. 
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Load  Intersection  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 
RETRIEVE  INTO  TEMPx  (L . attrlist. ) 

Jj 

WHERE  L.attrL1  =  R.attrR1  AND  L.attr^^  = 

R.attr_»  .  .  .  AND  L.attr-  =  R.attr., 

R2  Ln  Rn 

AND  PT  AND  P_ 

Li  K 

Generate  sequence  number  and  insert  it  into  the  TEMPx 
field; 

Insert  relation  names  into  the  relnamel  and  relname2 
fields; 

Build  target  list  for  RETRIEVE  statement  using 
L-prefixes  with  all  the  attribute  names  of  the  left 
topmost  attribute  list; 

Select  corresponding  attribute  names  from  the  right 
topmost  and  left  topmost  attribute  lists  and  append  to 
WHERE  clause  using  the  proper  L-  and  R-prefixes; 

Build  WHERE  clause  using  L-prefixes  with  the  attribute 
names  in  the  left  composite  predicate  and  R-prefixes  on 
the  attribute  names  in  the  right  composite  predicate; 

Figure  50:  Psuedo-code  for  Type  II 
node  with  Intersection. 


Difference 

The  steps  necessary  to  translate  a  Type  II  node  con¬ 
taining  the  difference  operation  are  shown  in  Figure  51. 

Product 

The  steps  necessary  to  translate  a  Type  II  node  con¬ 
taining  the  product  operation  are  shown  in  Figure  52. 
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Load  Union  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 

RETRIEVE  INTO  TEMPx  (L . attrlistL)  WHERE  PL 

APPEND  TO  TEMPx  (attrL1  =  L.attrRl,  attrR2  = 

L.attr_-,  .  .  .  attr.  =  L.attr_  ) 

R2  Ln  Rn 

WHERE  PD 

R 

RANGE  OF  T  IS  TEMPx 
RETRIEVE  INTO  TEMPy  (T.ALL) 

Generate  sequence  numbers  and  insert  them  into  the 
TEMPx  and  TEMPy  fields; 

Insert  relation  names  into  the  relnamel  and  relname2 
fields; 

Build  target  list  for  RETRIEVE  statement  using 
L-prefixes  on  all  the  attribute  names  in  the  left 
topmost  attribute  list; 

Build  first  WHERE  clause  using  L-prefixes  on  the 
attribute  names  in  left  composite  predicate; 

Build  target  list  for  APPEND  statement  using  R-prefixes 
on  all  the  attribute  names  of  the  right  topmost  attri¬ 
bute  list  aliased  to  all  the  attribute  names  of  the 
left  topmost  attribute  list; 

Build  first  WHERE  clause  using  R-prefixes  on  the 
attribute  names  in  the  right  composite  predicate; 

Figure  49:  Pseudo-code  for  Type  II 
node  with  Union. 


Intersection 


The  steps  necessary  to  translate  a  Type  II  node  con¬ 
taining  the  intersection  operation  are  shown  in  Figure  50. 
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RANGE  OP  L  IS  RELATION1 
RANGE  OF  R  IS  RELATION2 

RETRIEVE  INTO  RESULT  (L. lef t-topmost-attrlist , 
R.right-topmost-attrlist) 

WHERE  join-predicate  AND  left-composite-predicate 
AND  right-composite-predicate 

Natural  Join.  When  two  Type  1  branches  are  combined 
through  the  natural  join  operation,  it  is  equivalent  to  the 
following: 

RANGE  OF  L  IS  RELATI0N1 
RANGE  OF  R  IS  RELATI0N2 

RETRIEVE  INTO  RESULT  (L. left-topmost-attrlist , 
R.right-topmost-attrlist) 

WHERE  join-predicate  AND  left-composite-predicate 
AND  right-composite-predicate 

Optimal  Algorithm 

We  shall  now  present  the  steps  for  query  translation 
for  each  of  the  binary  operations  which  use  the  minimization 
rules  just  shown. 

Union 

The  steps  necessary  to  translate  a  Type  II  node  con¬ 
taining  the  union  operation  are  shown  in  Figure  49. 


41 


AND  L.attrl  =  R.attrl  AND  L.attr2  =  R.attr2 
AND  .  .  .  L.attrn  =  R.attrn 

Difference.  When  two  Type  1  branches  are  combined 
through  the  difference  operation,  it  is  equivalent  to  the 
following: 

RANGE  OF  L  IS  RELATIONl 
RANGE  OF  R  IS  RELATION2 

RETRIEVE  INTO  RESULT  (L. left-topmost-attrlist) 

WHERE  left-composite-predicate 
RANGE  OF  T  IS  RESULT 

DELETE  T  WHERE  right-composite-predicate  AND 
L.attrl  =  R.attrl  AND  L.attr2  =  R.attr2 
AND  .  .  .  AND  L.attrn  =  R.attrn 

Product.  When  two  Type  1  branches  are  combined  through 
the  product  operation,  it  is  equivalent  to  the  following: 
RANGE  OF  L  IS  RELATIONl 
RANGE  OF  R  IS  RELATI0N2 

RETRIEVE  INTO  RESULT  (L. left-topmost-attrlist , 
R.right-topmost-attrlist) 

WHERE  left-composite-predicate  AND 
right-composite-predicate 

Join.  When  two  Type  1  branches  are  combined  through 


the  join  operation,  it  is  equivalent  to  the  following: 
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RANGE  OF  X  IS  R 

RETRIEVE  INTO  TEMP  (X . topmost-attrlist) 

WHERE  composite-predicate 

Type  II 

This  type  of  pattern  has  one  variation  for  each  of  the 
binary  operations. 

Union.  When  two  Type  1  branches  are  combined  through 
the  union  operation,  it  is  equivalent  to  the  following: 
RANGE  OF  L  IS  RELATIONl 
RANGE  OF  R  IS  RELATION2 

RETRIEVE  INTO  TEMP  (L. left-topmost-attrlist) 

WHERE  left-composite-predicate 

APPEND  TO  TEMP  (R. right-topmost-attrlist) 

WHERE  right-composite-predicate 

RANGE  OF  T  IS  TEMP 

RETRIEVE  INTO  RESULT  (T. ALL) 

Intersection.  When  two  Type  1  branches  are  combined 
through  the  intersection  operation,  it  is  equivalent  to  the 
following: 

RANGE  OF  L  IS  RELATIONl 
RANGE  OF  R  IS  RELATION2 

RETRIEVE  INTO  RESULT  (L. left-topmost-attrlist) 

WHERE  left-composite-predicate  AND 
right-composite-predicate 
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RETRIEVE  INTO  TEMPn  (X.attrlist  n) 
is  equivalent  to 

RANGE  OF  X  IS  R 

RETRIEVE  INTO  TEMP  {X.attrlist  n) 

Consecutive  selections.  Any  number  of  consecutive 
selections  on  a  relation  R  is  equivalent  to  one  selection 
which  uses  a  selection  predicate  which  is  the  conjunction  of 
all  their  predicates.  This  predicate  is  referred  to  as  the 
composite  predicate.  Thus, 

RANGE  OF  X  IS  R 

RETRIEVE  INTO  TEMPI  (X .ALL)  WHERE  P 
RANGE  OF  X  IS  TEMPI 

RETRIEVE  INTO  TEMP 2  (X . ALL)  WHERE  P2 

• 

# 

RANGE  OF  X  IS  TEMPm-1 

RETRIEVE  INTO  TEMPm  ( X . ALL )  WHERE  P 

m 

is  equivalent  to 

RANGE  OF  X  IS  R 

RETRIEVE  INTO  TEMP  WHERE  P.  AND  P-  AND  .  .  .  AND  P 

12  m 

when  P^,  P2 ,  .  .  .  ,  Pm  are  predicates  which  reference  only 
attributes  of  R. 

Mixed  projections  and  selections.  Any  mix  of  selec¬ 
tions  and  projections  on  relation  R,  is  equivalent  to  one 
projection  which  uses  the  topmost  attribute  list  and  one 
selection  which  uses  the  composite  predicate,  as  follows: 
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Figure  48:  Basic  pattern  types. 


Type  1 

The  two  unary  operations,  projection  and  selection, 
give  rise  to  three  variations  of  this  pattern:  consecutive 
projections  only,  consecutive  selections  only,  and  mixed 
selections  and  projections. 

Consecutive  projections.  Any  number  of  consecutive 
projections  on  a  relation  R,  such  that  the  set  of  attributes 
specified  in  any  projection  is  always  a  subset  of  the  set  of 
attributes  specified  in  the  previous  projection,  is  equiva¬ 
lent  to  one  projection  on  R  using  the  attribute  set  of  the 
last  projection.  This  attribute  set  is  referred  to  as  the 
topmost  attribute  list.  Thus, 

RANGE  OF  X  IS  R 

RETRIEVE  INTO  TEMPI  (X.attrlist  1) 

RANGE  OF  X  IS  TEMPI 

RETRIEVE  INTO  TEMP 2  (X.attrlist  2) 


RANGE  OF  X  IS  TEMPn-1 


CHAPTER  V 
MINIMIZATION 


The  node-per-node  translation  algorithm  presented  in 
the  previous  section  can  be  implemented  with  relative  ease 
but  the  large  number  of  QUEL  statements  that  it  produces 
makes  it  inefficient.  By  introducing  minimization  tech¬ 
niques  [Chu82] ,  it  is  possible  to  produce  a  smaller  number 
of  QUEL  statements  which  accomplish  the  same  results  as  the 
node-per-node  translation. 

Minimization  Rules 

Compression  of  the  query  tree  during  the  tree  process¬ 
ing  phase  yields  a  new  query  tree  which  consists  of  fewer 
nodes.  The  nodes  of  the  new  query  tree  specify  combinations 
of  algebraic  operations  that  can  be  expressed  with  less  QUEL 
statements.  Selection  of  nodes  to  be  compressed  is  accom¬ 
plished  by  recognition  of  two  basic  types  of  reoccurring 
patterns,  shown  in  Figure  48. 

The  first  pattern  occurs  when  a  series  of  consecutive 
unary  operations  is  specified  and  will  be  referred  to  as 
Type  1.  The  second  pattern  occurs  when  the  results  of  two 
separate  Type  1  patterns  are  combined  through  a  binary 
operation  and  will  be  referred  to  as  Type  II. 
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7  RANGE  OF  L  IS  SPJ 

RETRIEVE  INTO  TEMPI  (L. ALL)  WHERE  L.supno  =  "si" 

6  RANGE  OF  L  IS  TEMPI 

RETRIEVE  INTO  TEMP 2  (R.partno) 

5  RANGE  OF  L  IS  PART 

RETRIEVE  INTO  TEMP 3  (L.partno) 

4  RANGE  OF  L  IS  TEMP 3 
RANGE  OF  R  IS  TEMP 2 
RETRIEVE  INTO  TEMP 4  ( L . ALL ) 

RANGE  OF  T  IS  TEMP 4 
DELETE  T  WHERE  T.partno  =  R.partno 
3  RANGE  OF  L  IS  SPJ 
RANGE  OF  R  IS  TEMP 4 

RETRIEVE  INTO  TEMP 5  (L. ALL)  WHERE  L.partno  =  R.partno 
2  RANGE  OF  L  IS  TEMP 5 

RETRIEVE  INTO  TEMP 6  (L.projno) 

RANGE  OF  L  IS  PROJECT 
RETRIEVE  INTO  TEMP 7  (L.projno) 

0  RANGE  OF  L  IS  TEMP 7 
RANGE  OF  R  IS  TEMP 6 
RETRIEVE  INTO  TEMP 7  ( L . ALL ) 

RANGE  OF  T  IS  TEMP 7 

DELETE  T  WHERE  T.projno  =  R.projno 

Figure  47:  QUEL  translation  of  Query  3. 
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RANGE  OF 
RETRIEVE 
RANGE  OF 
RETRIEVE 
RANGE  OF 
RANGE  OF 
RETRIEVE 
R.projno 


L  IS  PROJECT 

INTO  TEMPI  (L.projno,  L.city) 

L  IS  SUPPLIER 

INTO  TEMP 2  {L.supno,  L.city) 

L  IS  TEMP 2 
R  IS  SPJ 

INTO  TEMP 3  ( L . ALL ,  supno2  =  R.supno, 
,  R.qty) 


R.partno 


RANGE  OF  L  IS  TEMP 3 
RANGE  OF  R  IS  TEMPI 

RETRIEVE  INTO  TEMP 4  (L.ALL,  projno2  =  R.projno, 


city2  =  R.city) 


/ 


RANGE  OF  L  IS  TEMP 4 
RETRIEVE  INTO  TEMP 5  (L.ALL) 


WHERE  L.supno  =  L.supno2  AND  L.projno  =  L.projno2 
RANGE  OF  L  IS  TEMP 5 

RETRIEVE  INTO  TEMP 6  (L.city,  L.partno,  L.city2) 


Figure  45:  QUEL  translation  for  Query  2. 


from  supplier  SI."  The  equivalent  algebraic  representation 
for  this  query  is: 

PROJECTfpro jnoj  MINUS  ({SPJ  NJOIN (PART [partno] MINUS 
(SPJ  WHERE  supno  =  'SI')  [partno]))  [projno] ) 
and  the  query  tree  and  order  of  traversal  are  shown  in 
Figure  46.  The  corresponding  QUEL  translation  is  shown  in 
Figure  47. 


fa)  MINUS 

[projno]  Q  0  [projno] 

PROJECT  over  partno 

SPJ  QMINUS 
[partno]  (p  Vp [partno] 

PART  Q  SPJ  WHERE  supno-  sf 
SPJ 


Figure  46:  Query  tree  for  Query  3. 
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Query  2 

Exercise  7.19  in  [Dat82]  read  as  follows:  "Get  all 
(city,  partno,  city)  triples  such  that  a  supplier  in  the 
first  city  supplies  the  specified  part  to  a  project  in  the 
second  city."  The  equivalent  algebraic  representation  for 
this  query  is: 

( (SUPPLIER [supno,  city]  TIMES  SPJ  TIMES  PROJECT [pro jno, 
city])  WHERE  SUPPLIER . supno  =  SPJ. supno  AND 
SPJ.prc^no  =  PROJECT . pro jno) [SUPPLIER. city,  partno, 
PROJECT. city] 

and  the  query  tree  and  order  of  traversal  are  shown  in 
Figure  44.  The  corresponding  QUEL  translation  is  shown  in 
Figure  45. 


[city, partno,  city] 

I  ( supnossupno  AND  projno=projno  )  1X3 

GD 

TIMES  jjj 

TIMES  i  [projno,  city]  QD 

[supno,  city](?f  SPJ  PROJECT  ® 

SUPPLIER 

Figure  44:  Query  tree  for  Query  2. 


Query  3 

Exercise  7.25  in  [Dat82]  read  as  follows:  "Get  projno 
values  for  projects  which  use  only  parts  which  are  available 
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Query  1 

Exercise  7.17  in  [Dat82]  reads  as  follows:  "Get  projno 
values  for  projects  using  at  least  one  part  available  from 
supplier  SI."  The  algebraic  representation  for  this  query 
is: 


((SPJ  WHERE  supno  =  ’SI’) [partno]  NJOIN  SPJ) [projno] 
and  the  query  tree  and  order  of  traversal  are  shown  in 
Figure  42.  The  corresponding  QUEL  translation  is  shown  in 
Figure  43. 


Iprojrto]  (“) 
NJOIN  over  partno 
[partno]  Qj 
(supno^'sf)  Q 
SPJ 


HI 

m 

m 


Figure  42:  Query  tree  for  Query  1. 


3  RANGE  OF  L  IS  SPJ 

RETRIEVE  INTO  TEMPI  (L.A11)  WHERE  L. supno  =  "si" 

2  RANGE  OF  L  IS  TEMPI 

RETRIEVE  INTO  TEMP 2  (L. partno) 

RANGE  OF  L  IS  TEMP 2 
1  RANGE  OF  R  IS  SPJ 

RETRIEVE  INTO  TEMP 3  (L. partno,  R. supno,  R. projno,  R.qty) 
WHERE  L. partno  *  R. partno 
0  RANGE  OF  L  IS  TEMP 3 

RETRIEVE  INTO  TEMP 4  (L. projno) 

Figure  43:  QUEL  translation  for  Query  1. 


Load  Natural  Join  template  into  buffer  area; 

RANGji  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 

RETRIEVE  INTO  TEMPx  (L. attrlistl ,  R.attrlist2) 
WHERE  Pj 

Generate  sequence  number  and  insert  into  TEMPx  field; 

Insert  relation  names  into  relnamel  and  relname2 
fields; 

Build  target  list  using  L-prefixes  with  all  the  attri¬ 
bute  names  of  the  left  relation  and  R-prefixes  with  all 
the  attribute  names  of  the  right  relation; 

Delete  duplicate  attribute  names  which  appear  on  the 
right  side(s)  of  the  join  predicate; 

For  other  duplicate  attribute  names,  generate  alias 
names  and  enter  them  into  the  aliases  table; 

Build  WHERE  clause  using  the  join  predicate  and  add 
L-prefixes  to  the  attribute  names  of  the  left  relation 
and  R-prefixes  to  the  attribute  names  of  the  right 
relation; 

Figure  41:  Pseudo-code  for  Natural  Join  node. 


Examples 


The  following  examples  illustrate  the  mechanics  of  the 
translation  algorithm  and  are  based  on  the  Parts  &  Suppliers 
Database  found  in  [Dat82] ,  shown  below: 


SUPPLIER  (supno,  sname,  status,  city) 

PART  (partno,  pname,  color,  weight,  city) 
PROJECT  (projno,  jname,  city) 

SPJ  (supno,  partno,  projno,  qty) 
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Load  product  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname  2 

RETRIEVE  INTO  TEMPx  (L.attrlist  ,  R.attrlist  ) 
WHERE  PL  AND  PR 

Generate  sequence  number  and  insert  it  into  the  TEMPx 
field; 

Insert  relation  names  into  the  relnamel  and  relname2 
fields; 

Build  target  list  for  RETRIEVE  statement  using 
L-prefixes  with  all  the  attribute  names  of  the  left 
topmost  attribute  list  and  R-prefixes  with  all  the 
attribute  names  of  the  right  topmost  attribute  list; 

Alias  duplicate  attribute  names  of  the  right  topmost 
attribute  list  to  local  names  annotated  in  aliases 
table; 

Build  WHERE  clause  using  L-prefixes  with  the  attribute 
names  in  the  left  composite  predicate  and  R-prefixes 
with  the  attribute  names  in  the  right  composite 
predicate; 

Figure  52:  Pseudo-code  for  Type  II 
node  with  Product. 
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Load  Join  template  into  buffer  area; 

RANGE  OP  L  IS  relnamel 
RANGE  OF  R  IS  relname2 

RETRIEVE  INTO  TEMPx  (L. attrlist. ,  R.attlist  ) 

WHERE  Pj  AND  PL  AND  PR  R 

Generate  sequence  number  and  insert  it  into  the  TEMPx 
field; 

Insert  relation  names  into  relnamel  and  relname2 
fields; 

Build  target  list  for  RETRIEVE  statement  using 
L-prefixes  with  all  the  attribute  names  of  the  left 
topmost  attribute  list  and  R-prefixes  with  the  attri¬ 
bute  names  of  the  right  topmost  attribute  list; 

Alias  duplicate  attribute  names  of  the  right  topmost 
attribute  list  to  local  names  annotated  in  aliases 
table; 

Build  WHERE  clause  using  using  L-prefixes  with  attri¬ 
bute  names  in  the  left  composite  predicate,  R-prefixes 
with  attribute  names  in  the  right  composite  predicate, 
and  the  proper  prefixes  with  the  attribute  names  in  the 
join  predicate; 

Figure  53:  Pseudo-code  of  Type  II  node  with  Join. 
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Natural  Join 


The  steps  necessary  to  translate  a  Type  II  node  con¬ 
taining  the  natural  join  operation  are  shown  in  Figure  54. 


Load  Natural  Join  template  into  buffer  area; 

RANGE  OF  L  IS  relnamel 
RANGE  OF  R  IS  relname2 

RETRIEVE  INTO  TEMPx  (L.attrlist  ,  R.attlist  ) 
WHERE  PT  AND  PT  AND  P_  h  R 

u  -b  K 

Generate  sequence  number  and  insert  it  into  the  TEMPx 
field; 

Insert  relation  names  into  relnamel  and  relname2 
fields; 

Build  target  list  for  RETRIEVE  statement  using 
L-prefixes  with  all  the  attribute  names  of  the  left 
topmost  attribute  list  and  R-prefixes  with  the  attri¬ 
bute  names  of  the  right  topmost  attribute  list; 

Delete  redundant  attribute  names,  i.e.  the  attribute 
names  appearing  on  the  right  side  of  the  join 
predicate; 

Alias  duplicate  attribute  names  of  the  right  topmost 
attribute  list  to  local  names  annotated  in  aliases 
table; 

Build  WHERE  clause  using  L-prefixes  with  attribute 
names  in  the  left  composite  predicate,  R-prefixes  with 
attribute  names  in  the  right  composite  predicate,  and 
the  proper  prefixes  with  the  attribute  names  in  the 
join  predicate; 

Figure  54:  Pseudo-code  for  Type  II 
node  with  Natural  Join. 
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Examples 

The  examples  which  are  presented  below  illustrate  the 
improvement  that  results  from  applying  the  new  translation 
algorithm  to  the  sample  queries  presented  in  the  previous 
chapter.  A  reduced  number  of  QUEL  statements  is  achieved 
when  using  this  method  of  translation. 

Query  1 

The  translation  of  this  query  can  be  optimized  by 
compressing  the  query  tree  as  shown  in  Figure  55.  The  QUEL 
translation  for  the  compressed  tree  is  shown  in  Figure  56. 


Figure  55:  Tree  compression  for  Query  1. 


1  RANGE  OF  L  IS  SPJ 
RANGE  OF  R  IS  SPJ 

RETRIEVE  INTO  TEMPI  (L.partno,  R.supno,  R.projno,  R.qty) 
WHERE  L.supno  =  "si"  AND  L.partno  =  R.partno 
0  RANGE  OF  L  IS  TEMPI 

RETRIEVE  INTO  TEMP 2  (L.projno) 


Figure  56:  Optimized  QUEL  translation  for  Query  1. 


Query  2 

The  translation  of  this  query  can  be  optimized  by 
compressing  the  query  tree  as  shown  in  Figure  57.  The  QUEL 
translation  for  the-  compressed  tree  is  shown  in  Figure  58. 


Figure  57:  Tree  compression  for  Query  2. 


3  RANGE  OF  L  IS  SUPPLIER 

RANGE  OF  R  IS  SPJ 

RETRIEVE  INTO  TEMPI  (L.supno,  L.city, 
supno2  =  R.supno,  R.partno,  R.projno,  R.qty) 

2  RANGE  OF  L  IS  TEMPI 

RANGE  OF  R  IS  PROJECT 

RETRIEVE  INTO  TEMP 2  (L.ALL,  projno2  =  R.projno 
city2  *  R.city) 

0  RANGE  OF  L  IS  TEMP 2 

RETRIEVE  INTO  TEMP 3  (L.city,  L.partno,  L.city2) 

WHERE  L.supno  »  L.supno2  AND  L.projno  =  L.projno2 

Figure  58:  Optimized  QUEL  translation  for  Query  2. 


Query  3 

The  translation  of  this  query  can  be  optimized  by 
compressing  the  query  tree  as  shown  in  Figure  59.  The  QUEL 
translation  for  the  compressed  tree  is  shown  in  Figure  60. 
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Figure  59: 


Tree  compression  for  Query  3. 


4  RANGE  OF  L  IS  PART 

RANGE  OF  P  IS  SPJ 

RETRIEVE  INTO  TEMPI  (L.partno) 

RANGE  OF  T  IS  TEMPI 

DELETE  T  WHERE  R.supno  =  "si"  AND  T.partno  =  R.partnc 
3  RANGE  OF  L  IS  SPJ 

RANGE  OF  R  IS  TEMPI 

RETRIEVE  INTO  TEMP 2  (L.projno)  WHERE  L.partno  =  R.partno 
0  RANGE  OF  L  IS  PROJECT 
RANGE  OF  R  is  TEMP 2 
RETRIEVE  INTO  TEMP 3  (L.projno) 

RANGE  OF  T  IS  TEMP 3 

DELETE  T  WHERE  T.projno  =  R.projno 

Figure  60:  Optimized  QUEL  translation  for  Query  3. 


CHAPTER  VI 
IMPLEMENTATION 


This  chapter  discusses  some  general  guidelines  to 
follow  ir.  implementing  the  algorithm  presented  in  the 
previous  chapter.  An  implementation  has  been  successfully 
completed  in  the  HC"  language  by  Mr.  Mohamed  Khatib,  a 
graduate  student  currently  doing  related  research  for  the 
NBS  project  at  the  time  of  this  writing.  The  main  topics 
that  will  be  discussed  are:  query  packet  format,  tree 
processing,  operation  mapping,  and  minimization. 

Query  Packet  Format 

The  Master  Data  Administrator  System  (MDAS)  of  [NBS85] 
issues  data  management  commands,  in  the  form  of  query 
packets,  to  each  of  the  distributed  data  management  sub¬ 
systems.  The  packets  are  organized  into  records  of  pre¬ 
defined  format.  These  records  contain  all  the  algebraic 
operators,  relation  names,  attribute  names,  and  qualifi¬ 
cations  which  completely  specify  what  the  subsystem  has  to 
do. 

Each  query  packet  has  three  types  of  records:  tree 
descriptors,  operation  descriptors,  and  relation  tables. 
Tree  descriptors  define  the  structure  of  a  query  tree, 
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that  is,  they  describe  the  tree  in  terms  of  node  inter¬ 
relationships,  algebraic  operators,  and  operand  relations. 
Operation  descriptors  contain  either  attribute  lists  or 
qualifications,  which  further  define  the  operations  at  each 
node.  Relation  tables  contain  dictionary  information  about 
the  relations  which  are  referenced  in  the  query  tree,  which 
makes  it  possible  to  perform  the  translation  without 
accessing  the  target  subsystem. 

Tree  Descriptors 

These  records  define  a  binary  tree  which  determines  the 
order  of  operations  to  be  performed.  Each  record  consists 
of  the  following  fields:  operation,  left-son,  right-son, 
left-relation,  and  right-relation.  Operation  is  one  of  the 
algebraic  operators,  left-son  and  right-son  are  the  sequence 
numbers  of  the  tree  descriptor  records  which  describe  that 
node's  left  and  right  successor  nodes,  and  left-relation  and 
right-relation  are  the  names  of  the  relations  used  at  that 
node . 

The  third  example  from  Chapter  IV,  whose  query  tree  is 
shown  again  in  Figure  61,  could  be  represented  by  the 
tree  descriptor  records  which  appear  in  Table  1 .  The 
information  shown  in  parenthesis  is  only  explanatory  and 
would  not  actually  appear  in  the  query  packet.  The  hyphen¬ 
ated  entries  are  used  to  denote  that  a  given  node  does  not 
have  a  left  and/or  right  successor  or  that  intermediate 
results  are  used  as  operands. 


•/  •/  ••»*•••*.  * 


«--V. 
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fl}  MINUS 

[projno]  O  0  Iprojnol 

PROJECT  j^NJOIN  over  partno 
ft  MINUS 


PART 


m[portno] 

□  SPJ  WHERE  3upno=‘er 


SPJ 


Figure  61:  Query  tree  for  Example  3, 


Table  1 .  Tree  descriptors  for  Example  3 . 


(Rec  #)  (Oper)  (L-Son)  (R-Son)  (L-Rel)  (R-Rel) 


(0) 

MINUS 

1 

(1) 

PROJ 

- 

(2) 

SEL 

3 

(3) 

NJOIN 

- 

(4) 

MINUS 

5 

(5) 

MINUS 

- 

(6) 

PROJ 

7 

(7) 

SEL 

- 

2 

PROJECT 

4  SPJ 

6 

PART 


SPJ 
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Operation  Descriptors 

These  records  are  defined  according  to  the  algebraic 
operations  that  they  specify.  There  are  three  types  of 
field  descriptors:  attribute  lists,  predicates,  and  rela¬ 
tion  tables. 

Attribute  lists.  Attribute  lists  are  used  to  specify 
attribute  names  for  the  projection  operation.  They  consist 
of  the  following  fields: 

N,  attr-namel,  attr-name2,  .  .  .  attr-nameN 
where  N  is  the  number  of  attributes  in  the  list  and  attr- 
namel  through  attr-nameN  are  the  qualified  attribute  names. 
Using  qualified  attribute  names  helps  the  translator  attach 
the  appropriate  tuple  variable  to  each  attribute  name  by 
replacing  each  relation  name  prefix  with  the  proper  tuple 
variable  prefix.  It  also  simplifies  attribute  name  aliasing 
and  eliminated  ambiguity  in  projection  operations  which 
permute  attributes. 

Predicates.  Predicates  are  used  with  selection  and 
join  operations.  Each  predicate  consists  of  a  QUEL- 
compatible  qualification  string.  In  order  to  a  predicate  to 
be  QUEL-compatible  it  must  use  qualified  names  and  all  the 
QUEL  comparison  operator  symbols.  Using  QUEL-compatible 
qualification  strings  saves  the  translator  the  task  of 
parsing  these  predicates.  At  run  time,  the  translator 
merely  scans  the  predicate,  replaces  each  relation  name 
prefix  with  the  proper  tuple  variable  prefix,  and  inserts 
the  qualification  string  into  the  QUEL  sequence. 
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Relation  tables.  Relation  tables  contain  all  the 
attribute  names  for  each  relation  referenced  in  the  query 
tree.  In  this  case,  however,  the  attribute  names  need  not 
be  not  qualified,  that  is,  they  do  not  contain  relation  name 
prefixes  since  it  is  understood  that  all  the  attributes 
belong  to  the  relation  specified  in  that  record.  Relation 
tables  consist  of  the  following  fields: 

relation-name,  N,  attr-namel,  attr-name2,  .  .  .  attr-nameN 
where  relation-name  is  self  explanatory,  N  is  the  number  of 
attributes,  and  attribute-name 1  through  attribute-nameN  are 
all  the  attribute  names  for  that  relation,  in  the  order  they 
appear  in  the  view  or  base  relation  of  the  target  subsystem. 
For  economy  reasons,  only  one  relation  table  is  required  for 
each  relation  referenced  in  the  query  tree. 

To  avoid  ambiguity,  delimiters  must  be  used  with  all 
three  types  of  descriptors.  The  operation  descriptors  for 
Example  3  are  shown  in  Table  2. 

Tree  Processing 

The  objectives  of  this  part  of  the  implementation  are 
to  arrive  at  the  correct  node  ordering  for  processing  and  to 
minimize  the  number  of  nodes  to  be  translated. 

As  mentioned  in  Chapter  IV,  the  correct  order  of  execu¬ 
tion  is  computed  using  the  sibling  pointers  for  each  node 
and  the  saved  on  a  stack.  The  content  of  this  stack  can  be 
generated  with  a  recursive  function  which  traverses  the  tree 
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Table  2.  Operation  descriptors  for  Example  3. 


(Record  Content) 

1  PROJECT. pro j no 
1  SPJ.projno 

SPJ.partno  =  PART.partno 
1  PART.partno 
1  SPJ.partno 
SPJ.supno  =  "SI" 

SPJ  4  supno  partno  pro j no  qty 
PROJECT  3  pro j no  jname  city 
PART  5  partno  pname  color 
weight  city 


(Remarks) 

attr  list  for  node  1 
attr  list  for  node  2 
join  condition  for  node  3 
attr  list  for  node  5 
attr  list  for  node  6 
select  condition  for  node  7 
relation  table  for  SPJ 
relation  table  for  PROJECT 

relation  table  for  PART 


to  find  all  the  leaf  nodes.  This  stack  is  then  used  to 
generate  the  correct  order  of  QUEL  statements. 

Minimization  considerations  are  made  while  analyzing 
the  tree  in  order  to  produce  an  optimal  number  of  correctly 
ordered  statements.  The  criteria  for  compressing  the 
original  tree  are  actually  described  in  the  next  section. 

At  this  point  it  is  sufficient  to  note  that  the  stack  which 
defines  the  correct  order  of  execution  for  the  original  tree 
can  be  modified  after  the  compressed  tree  is  defined.  An 
alternate  method  is  to  integrate  the  compression  criteria 
into  the  recursive  function  so  that  only  the  modes  contain¬ 
ing  binary  operators  are  placed  on  the  stack.  In  either 
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case,  all  the  information  pertaining  to  unary  nodes  must  be 
linked  to  the  corresponding  compressed  nodes. 

Operation  Mapping 

There  are  two  important  mechanisms  needed  to  support 
the  translation  process;  statement  generation  and  attribute 
tracking.  The  first  one  is  used  to  build  QUEL  statements  in 
a  buffer  through  template  modification.  The  second  is  used 
to  keep  track  of  the  attribute  names  of  relations  throughout 
the  translation  process.  Both  mechanisms  are  discussed 
below. 

Statement  Generation 

The  translation  of  a  node  into  QUEL  takes  place  via 
modification  of  the  corresponding  template  in  the  work 
buffer.  Modifying  the  template  to  reflect  the  characteris¬ 
tics  of  the  operation  specified  at  that  node,  is  accom¬ 
plished  through  following  steps:  a)  insertion  of  the 
sequence  numbers  for  intermediate  results  (TEMPx  relations) , 

b)  insertion  of  all  pertinent  attribute  names  with  proper 
range  variable  prefixes  into  the  corresponding  target  lists, 

c)  replacement  of  qualified  attribute  names  with  appropriate 
row  markers  and  attribute  names  in  logical  predicates  and 
subsequent  insertion  into  the  corresponding  qualification 
fields,  and  d)  replacement  of  attribute  names  by  aliased 
names  generated  in  preceding  intermediate  results. 
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Attribute  Tracking 

Since  QUEL  does  not  permit  the  use  of  duplicate  attri¬ 
bute  names  within  the  same  relation,  it  is  necessary  to 
"alias"  the  names  of  duplicate  attribute  names  in  inter¬ 
mediate  results.  In  order  to  replace  any  attribute  name  by 
the  correct  alias  name,  the  translator  must  keep  track  of 
all  attribute  names.  This  can  be  accomplished  with  an 
attribute  name  table  with  an  entry  for  each  attribute  name 
from  a  temporary  relation  which  contains  its  relation  name 
and  the  original  qualified  name.  By  tracking  attribute 
names  in  this  manner,  the  translator  is  able  to  look  up  all 
attribute  names  before  insert  them  into  the  work  buffer. 

The  sample  attribute  name  table  shown  in  Table  3  corresponds 
to  the  second  example  presented  in  Chapter  V. 

Minimization 

Minimization  in  initiated  by  identifying  the  nodes  that 
contain  binary  operations.  These  nodes  will  be  the  only 
nodes  which  will  exist  after  compression.  There  is  one 
exception:  any  series  of  unary  operations  which  are  near 

the  root  of  the  tree  and  which  are  not  followed  by  any 
binary  operations  will  also  become  a  single  node.  Fig¬ 
ure  62a  identifies  the  nodes  which  will  exist  after  com¬ 
pression  and  Figure  62b  shows  the  resulting  tree  following 
compression. 
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Figure  78:  Aggregate  Name  specification 


Figure  79:  Function  Name  specification 


Figure  72:  Expression  specification 


Figure  74:  Factor  specification 


Figure  69:  Qualification  specification. 


Figure  71:  Boolean  primary  specification 


Figure  68:  The  WHERE  clause 


APPENDIX 
INGRES  SYNTAX 


The  purpose  of  this  appendix  is  to  acquaint  the  reader 
with  the  portions  of  the  INGRES  syntax  which  are  used 
reoccurringly  in  the  chapters  of  this  thesis.  The  syntax  is 
presented  in  graphic  form.  The  graphic  constructs  used  are 
shown  in  Figure  63.  Figures  63a  and  63b  represent  atomic 
constructs,  i.e.  they  cannot  be  recursively  decomposed. 
Figure  63c  represents  a  construct  which  is  not  atomic, 
i.e.  it  can  be  made  up  of  atomic  or  non-atomic  constructs. 


(a)  <b)  (c) 

Figure  63:  Graphical  syntax  constructs. 

The  words  which  appear  within  the  graphic  constructs 
have  two  forms:  keywords  and  non-keywords.  Keywords  are 
always  underlined,  non-keywords  are  not.  The  INGRES 
statements  which  are  shown  are:  APPEND  (Figure  64) ,  DELETE 
(Figure  65) ,  RANGE  (Figure  *6) ,  and  RETRIEVE  (Figure  67) . 

In  addition,  the  WHERE  clause  (Figure  68)  and  its  components 
(Figures  69-79)  are  also  shown. 
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It  is  recommended  that  an  extended  version  of  the  relational 
algebra  be  developed  to  better  utilize  the  processing 
capabilities  of  other  target  subsystems. 

Another  issue  addressed  by  this  author  is  the  scope  of 
the  patterns  used  for  the  minimization  rules  presented  in 
Chapter  V.  The  translation  algorithm  presented  in 
Chapter  IV  obviously  generates  too  many  QUEL  statements  and 
the  algorithm  presented  in  Chapter  V  is  a  worthwhile 
improvement.  However,  it  is  debatable  whether  the  latter  is 
truly  optimal,  e.g.  there  are  more  compact  translations  for 
the  query  examples  presented  in  Chapter  V.  In  considering 
pattern  scope,  this  author  contends  that  a  generalized 
algorithm  with  patterns  of  wider  scope  would  be  too  complex 
and  would  not  be  reusable  in  other  translators,  mainly 
because  the  minimization  technique  depends  on  the  target 
DML's  semantic  capabilities.  Thus,  the  author  formulated  a 
"middle-of-the-road"  solution,  embodied  in  the  Type  II 
pattern.  In  any  case,  the  search  for  a  truly  optimal 
translator  merits  further  study. 

To  others  undertaking  future  work  on  similar  trans¬ 
lators,  this  author  recommends  that  analytical  studies  be 
made  about  the  relationships  between  the  relational  algebra 
and  the  underlying  semantics  of  the  target  DML.  Thinking  in 
retrospect,  mapping  those  relationships  would  have  been 
helpful  to  this  author,  e.g.  converting  relational  algebra 
to  tuple  calculus  before  translating  into  QUEL. 


A 


,S  A  A  . 


CHAPTER  VII 
CONCLUSION 


An  algorithm  to  translate  algebraic  queries  to  QUEL  was 
presented  in  the  previous  chapters.  At  the  time  of  this 
writing,  implementations  of  the  node-per-node  algorithm  and 
the  optimal  algorithm  have  been  tested  successfully.  This 
author  is  compelled  to  point  out  that,  while  this  work 
resulted  in  a  practical  and  feasible  solution,  it  is  by  no 
means  unique  and  other  alternatives  became  evident  during 
the  course  of  the  work. 

One  of  the  issues  confronted  by  the  author  deals  with 
the  limited  expressive  power  of  the  relational  algebra. 

QUEL  has  a  great  deal  more  expressive  power,  e.g.  aggregate 
functions,  mathematical  functions,  sorting  functions,  etc., 
not  found  in  the  relational  algebra.  Consequently,  a  large 
portion  of  the  processing  power  of  the  INGRES  subsystem 
remains  unexploited.  This  author  made  some  allowances  to 
compensate  for  this,  namely,  a  non-parsing  technique  is  used 
to  build  the  predicates  for  the  QUEL  translation.  This 
technique  is  flexible  enough  to  allow  the  usage  of  aggregate 
mathematical,  and  sorting  functions  in  QUEL  predicates  and 
make  better  use  of  the  INGRES  subsystem's  processing  capabil¬ 
ities. 
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(a)  (b) 

Figure  62:  Query  tree  compression. 


When  compression  takes  place,  all  operations  in  the 
query  tree  must  be  preserved.  The  operations  specified  in 
the  nodes  that  are  deleted  must  be  integrated  into  remaining 
nodes  according  to  the  rules  presented  in  Chapter  V.  The 
six  templates  previously  presented,  combine  several  unary 
operations  with  one  binary  operation.  A  compressed  node  has 
all  the  operations  which  were  originally  at  that  node  plus 
the  left  and  right  topmost  attribute  lists  and  composite 
predicates  which  are  extracted  from  nodes  being  deleted. 
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Table  3.  Attribute  name  table  for  Example  2  (optimized). 


(attrnam) 

(relation) 

(origin) 

supno 

TEMPI 

SUPPLIER. supno 

city 

TEMPI 

SUPPLIER. city 

supno 2 

TEMPI 

SPJ. supno 

partno 

TEMPI 

SPJ.r'artno 

pro j no 

TEMPI 

SPJ.projno 

qty 

TEMPI 

SPJ. qty 

supno 

TEMP  2 

TEMPI. supno 

city 

TEMP  2 

TEMPI. city 

supno 2 

TEMP  2 

TEMPI . supno2 

partno 

TEMP  2 

TEMP 1 . partno 

pro j no 

TEMP  2 

TEMPI .projno 

qty 

TEMP  2 

TEMP 1 . qty 

pro jno2 

TEMP  2 

PROJECT. projno 

city2 

TEMP  2 

PROJECT. city 

city 

TEMP  3 

TEMP 2 .city 

partno 

TEMP  3 

TEMP 2 .partno 

city2 

TEMP  3 

TEMP2 . city2 
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